TESTING DMPfold2

I have installed DMPfold2 and I previously made an alignment with an initial (not definitive) dataset of the family based on the PFAM domain ‘Gemini_BL1’ (PF00845). The initial dataset was obtained by executing PSI-BLAST (e-value 0.005) and fetching the corresponding amino acid sequences. I used as a query the HHM or consensus sequence which PFAM uses when it indentifies the presence of this domain in a sequence. The dataset used to test DMPfold contains 155 proteins with an average length of 297 amino acids. They were aligned with MUSCLE (MUltiple Sequence Comparison by Log- Expectation) with the default parameters set by MEGA, giving as a result the attached multiple sequence alignment (MSA).

alignment

FIGURE 1. Visualization of part of the alignment of proteins with ‘Gemini_BL1’ domain made by MUSCLE algorithm in MEGA.

 

In order to evaluate the efficiency of DMPfold2, I executed the program changing the number of iteration cycles and the number of steps in the geometry minimization. The program used 24 threads, half of the total number of threads (Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz).

  1. 1000 iterations + 1000 minimization steps (~ 2.5 hours)
  2. 10000 iterations + 1000 minimization steps (~ 20 hours)

However, if the MSA is submitted to the 3) trRosetta server, the job is completed in less than 2.5 hours. The problem would be that the confidence of the predicted model is low (estimated TM-score=0.245; 0.5 usually indicates a model with correctly predicted topology.)[1]. The resulting predicted proteins structures were visualized by UCSF Chimera.

SOME NOTES ABOUT STRUCTURE EVALUATION

  • The Z-score value reports the total energy of the model and ProSA allows to determine the quality of the generated model by comparing it with experimentally obtained protein structures of similar size.
  • The Ramachandran diagram is used to determine whether the regions are optimally placed in three-dimensional space, namely to see whether the torsion angles Φ (phi) and Ψ (psi) are distributed in accordance with the energetically favorable permitted regions.

In terms of quality, the models generated by DMPfold2 were globally not as good as the model obtained with the algorithm of trRosetta, as we can observe in the different outputs of programs for the evaluation of structural protein models (Figures 2-4 and Tables 1-3).

From my point of view, it would be worth to try Alphafold2, due to its accuracy in predicting protein structures (it is open source now) and in order to compare its results using this MSA.

 

  1. DMPfold2 with 1000 iterations + 1000 minimization steps

3D STRUCTURE REPRESENTATION

 

EVALUATION OF THE MODEL

TABLE 1. Summary statistics of the geometrical features when MSA of the proteins of the ‘Gemini_BL1’ domain family is given to DMPfold2 with 1000 iterations and 1000 minimization steps. The statistics were calculated by MolProbity software [2].

alignment

alignment

FIGURE 2. Results of evaluation of the structural model: A) local model quality, B) overall model quality [3] and C) Ramachandran plots [2].

 

  1. DMPfold2 with 10000 iterations + 1000 minimization steps

3D STRUCTURE REPRESENTATION

 

EVALUATION OF THE MODEL

TABLE 2. Summary statistics of the geometrical features when MSA of the proteins of the ‘Gemini_BL1’ domain family is given to DMPfold2 with 10000 iterations and 1000 minimization steps. The statistics were calculated by MolProbity software [2].
alignment

alignment

FIGURE 3. Results of evaluation of the structural model: A) local model quality, B) overall model quality [3] and C) Ramachandran plots [2].

 

  1. trRosetta server (default options)

3D STRUCTURE REPRESENTATION

 

EVALUATION OF THE MODEL

TABLE 3. Summary statistics of the geometrical features when MSA of the proteins of the ‘Gemini_BL1’ domain family is given to trRosetta with default options. The statistics were calculated by MolProbity software [1]. alignment

alignment

FIGURE 4. Results of evaluation of the structural model: A) local model quality, B) overall model quality [3] and C) Ramachandran plots [2].

 

REFERENCES

[1] J Yang, I Anishchenko, H Park, Z Peng, S Ovchinnikov, D Baker (2020). Improved protein structure prediction using predicted interresidue orientations, PNAS, 117: 1496-1503.
[2] Williams et al. (2018) MolProbity: More and better reference data for improved all-atom structure validation. Protein Science 27: 293-315.
[3] Wiederstein, M., & Sippl, M. J. (2007). ProSA-web: Interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Research, 35(SUPPL.2).